perm filename ORAL[TLK,DBL]2 blob sn#219355 filedate 1976-06-11 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00026 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.!XGPCOMMANDS←"/TMAR=50/PMAR=2100/BMAR=50"
C00005 00003	5↓_Discovery in Mathematics_↓*
C00008 00004	.S(Introduction)
C00011 00005	.S(Discovering the Primes)
C00018 00006	.S(We Did a Problem-Reduction)
C00023 00007	.S(This General Sequence of Reductions)
C00031 00008	.S(Introduce Heuristic Guidance)
C00035 00009	.S(We Write a Theory Formation Pgm)
C00037 00010	.S(Mathematical Theory Formation)
C00042 00011	.S(Writing AM)
C00053 00012	.S(What AM Does)
C00062 00013	.S(Action 1: Fill in entries of facets)
C00065 00014	.S(Action 2: Creating new concepts)
C00067 00015	.S(Action 3: Proposing a new job)
C00078 00016	.S(Locating Relevant Heuristics)
C00088 00017	.S(Cardinality Example)
C00091 00018	.S(Finding Examples of Equality)
C00098 00019	.S(Generalizing Equality)
C00101 00020	.S(Example 2)
C00106 00021	.S(Results)
C00114 00022	.S(Experiments With AM)
C00121 00023	.S(Maximally-Divisible Numbers)
C00127 00024	.S(Conclusions)
C00130 00025	.S(Supplement)
C00135 00026
C00136 ENDMK
C⊗;
.!XGPCOMMANDS←"/TMAR=50/PMAR=2100/BMAR=50";
.DEVICE XGP
.FONT 1 "BDR40"
.FONT 2 "NGR40"
.FONT 3 "FIX25"
.FONT 4  "BDI40"
.FONT 5  "BDR66"
.FONT 6  "NGR25"
.FONT 7  "NGR20"
.FONT 8 "SUP"
.FONT 9 "SUB"
.TURN ON "↑α↓_π[]{"
.TURN ON "⊗" FOR "%"
.TURN ON "@" FOR "%"
.PAGE FRAME 41 HIGH 83 WIDE
.AREA TEXT LINES 1 TO 39
.AREA FOOTING LINE 41
.COUNT PAGE PRINTING "1"
.TABBREAK
.ODDLEFTBORDER←EVENLEFTBORDER←1000
.AT "<<" ENTRY ">" ⊂ 
⊗4<Still to do: ENTRY⊗*
. ⊃
.MACRO B ⊂ BEGIN VERBATIM GROUP ⊃
.MACRO E ⊂ APART END ⊃
.MACRO FAS ⊂ FILL ADJUST SINGLE SPACE PREFACE 1 ⊃
.MACRO S(ENTRY) ⊂ FAS SKIP 2
.IF LINES≤10 THEN START SKIP TO COLUMN 1 END;
.ONCE CENTER SELECT 5
*** ENTRY ***
. ⊃
.FAS
.COMPACT
.SELECT 1
.PORTION THESIS
.PAGE←-1
.NEXT PAGE
.INDENT 0
.TURN OFF "{∞→}"   
.BEGIN CENTER 
⊗5↓_Discovery in Mathematics_↓⊗*
⊗5↓_as Heuristic Search_↓⊗*


Ph.D. Oral Exam
Wednesday, May 19, 1976


⊗2Douglas B. Lenat⊗*
Artificial Intelligence Lab
Computer Science Department
Stanford University



.END


A program,  called  "AM", is  described which  models  one aspect  of
elementary  mathematical research: developing new  concepts under the
guidance of a large body of heuristic rules.

The local heuristics  communicate via an  agenda mechanism, a  global
list of tasks for the system to  perform and reasons why each task is
plausible.  A single task might direct AM to define a new concept, or
to explore  some facet  of an existing  concept, or  to examine  some
empirical  data  for  regularities,  etc.   Repeatedly,  the  program
selects from the agenda the task having the best supporting  reasons,
and then executes it.

Each concept is  an active, structured  knowledge module.   A hundred
very   incomplete   modules   are   initially   provided,  each   one
corresponding to an elementary  set-theoretic concept (e.g.,  union).
This  provides a  definite but  immense  "space" which  AM begins  to
explore.   AM  extends its  knowledge base,  ultimately rediscovering
hundreds  of common  concepts  (e.g.,  numbers) and  theorems  (e.g.,
unique factorization).

This approach to  plausible inference contains some unexpected powers
and limitations.

.SKIP TO COLUMN 1

.EVERY FOOTING(,--- {PAGE} ---,)

.S(Introduction)

This talk
will fall naturally into 3 separate parts.
The last two of these will deal specifcally with one
heuristic program,  AM, which
emulates  some  of  the  activities  involved  in  doing  mathematics
research: namely, AM investigates  a given collection of objects  and
operators, and  gradually enlarges it  by defining new  concepts. All
this is  accomplished under the guidance of a large body of heuristic
rules.

But before discussing AM,  let's spend a  few minutes thinking about  how
scientific discoveries are made;  at least, about how they're made in
very elementary mathematics.

We'll gather a few hints from that analysis, and use them to  propose
a simple ⊗4model⊗* of how mathematical theories are developed.

At that time  we can turn  to AM, and view  it as a  computer program
embodying that model.

The first thing  I want to stress is the difference between solving a
given problem  and  thinking it  up in  the  first place.    Contrast
playing monopoly,  to inventing new  games of  the same quality.   
Or
contrast  solving the missionaries and cannibals problem, against the
ill-defined reasoning which led to formulating it.

Turning  to mathematics,  If  I  show  you the  definition  of  prime
numbers, <SLIDE primes> you can find some for me without any trouble.
Whoever first formulated that concept performed a much harder task.

.S(Discovering the Primes)


How did he do it?  How can we rationally account for the discovery of
prime  numbers?  We'd like  to propose a plausible  scenario in which
someone decides to  look at these numbers,  and then decides  they're
interesting and worth naming.

Imagine that  a bright undergraduate  math major has been  given some
formal  system of objects and operators,  which he doesn't realize is
isomorphic to set, set-operations, numbers, counting, and arithmetic.
He's  told  to  explore that  formal  system,  finding  out some  new
relationships between those  concepts which  were given  to him,  and
encouraged to  define  some new  concepts if  they  appear useful  or
interesting to the student/researcher.


In that context,  let me suggest to you one plausible scenario of the
discovery of prime numbers.

Suppose the  researcher has  just  been playing  with the  notion  of
"divisors-of" of  a number.  One  rule of thumb he  sometimes follows
when devising  new theories is to consider extreme cases. <SLIDE goto
extremes> In more precise terms,  this rule says that if  you've just
been studying an interesting function f which takes elements of set A
into those of set  B, and there  is some extremal  subset of B,  it's
worth taking the time to consider which elements from A map into that
distinguished subset.

In  the current  situation,  the function  f is  "Divisors-of".   For
example, <SLIDE: Heur overlay> it takes the number 12 into the set of
numbers which divide evenly into 12, namely {1,2,3,4,6,12}. So f maps
a number  into a set of numbers.  An interesting or extreme subset of
B would  thus  be  some extreme  kind  of set.    We have  names  for
extremely   small  sets,   for  example:   empty   sets,  singletons,
doubletons, tripletons.   The  rule  of thumb  thus says  it's  worth
considering the set of  numbers over here which map  into empty sets.
So it's worth  isolating those numbers which map into singletons; the
set of numbers whose  set of divisors  is a doubleton; numbers  whose
image under the function divisors-of is a tripleton.

In other  words, this  rule has suggested  defining and  studying the
following  classes of numbers:  the set of numbers  with no divisors,
with only 1  divisor, with  2 divisors <SLIDE  n divisors>, and  with
three  divisors.   A  priori,  any  or all  of  these  sets could  be
interesting.  It will turn out that ⊗4this⊗* set (2 divisors) is  the
most interesting and useful one. Why?

Just  the definitions  don't  help.  The  researcher might  take  the
trouble to  examine all the numbers less  than a thousand, let's say,
and place them in the appropriate set, depending on how many divisors
they had.

Well, these  two sets of numbers  are very small and  there's nothing
too  interesting about them.  Now we  turn our attention to these two
sets  of numbers...  Nothing  obviously  interesting about  these  (2
divisors), but these (3 divisors)  seem to all be perfect squares.  A
natural question to ask is:  what are square-roots of these  numbers?
We then notice that all the members here just  seem to be the squares
of the  members from this set.  This may cause us  to spend some more
time studying both these sets.

After examining some more  empirical data, the researcher may  notice
that all numbers seem to factor  uniquly into numbers drawn from this
set.   This is far from  trivial, and we'll go  into how he does this
later on.  He'll  conjecture the unique factorization  theorem <SLIDE
UFT> and notice that it can be made much less awkward if we give this
set a name, <SLIDE: Primes overlay> say PRIME NUMBERS.

Once he's got the  definition of primes, and  he's got a couple  good
reasons for giving them  a name, theorems involving primes,  we shall
say that he's actually ⊗4discovered⊗* them.

.S(We Did a Problem-Reduction)

We haven't completely  accounted for the discovery of  prime numbers,
we've just reduced  our problem from "how in the world could somebody
invent the notion of prime  numbers" to the somewhat simpler task  of
"how in the  world could somebody invent the  notion of divisors-of".
<SLIDE Reduc 1> The reduction was accomplished by citing a heuristic,
which said:

⊗4"Investigate the inverse image of unusual or extreme subsets of the
range of some operation."⊗*

In this case, the  extreme subset was "doubletons", and the operation
was divisors-of.  The  inverse image of doubletons  turned out to  be
the set of numbers we know as the primes.

"Looking for extreme cases" is a pretty general heuristic.

It's not a rule of  inference, it's just a rule of  thumb; it doesn't
guarantee  anything, in  the sense  that  modus ponens  guarantees to
preserve validity,  or  in  the  sense that  Waltz's  constraints  on
shadowing line drawings were absolute.

This  heuristic rule  certainly isn't  applicable or  useful all  the
time.    And yet  it's  worth remembering  and using  because,  in my
experience, it frequently leads to interesting, valuable new concepts
to  investigate.     In  the  long  run,   using  this  heuristic  is
cost-effective.

A natural and fair question, now, is "OK, so you've rationalized  the
discovery of prime numbers, but why  was the researcher investigating
the  concept `divisors-of' to  begin with?".   Have we  really gotten
anywhere?  If "divisors-of" was one of the primitive conepts  we gave
that researcher  to begin with,  then the answer  is Yes, we  have in
fact explained  how he might have discovered primes.  But assume that
we didn't give him divisors-of, just simpler arithmetic operations.

Maybe  we use  the  same  technique  to explain  why  the  researcher
proposed this concept,  divisors-of.  We'd like to  grab some rule of
thumb and use it to rationalize that discovery.

But  look  at  the  definition  of  divisors-of(n)  <SLIDE:  Defn  of
Divisors-of>. It's  very closely  related to  the ⊗4inverse⊗* of  the
function  Times.    A  plausible  heuristic  tactic <SLIDE:  Look  at
inverse> is to investigate the inverse of each interesting operation.
So that  rule would account  for this  discovery, reducing it  to the
discovery  of the concept  of multiplication, because  looking at the
inverse of multiplication  would lead to  defining and studying  this
concept,  divisors-of.    Actually,  one  also  has  to  notice  that
multiplication is associative and commutative, i.e, it can be  viewed
as taking a ⊗4bag⊗* or multiset of numbers as its argument.

If you  can stand it,  we might try  to continue this  reduction even
further.  How  could we rationally explain how ⊗4this⊗* concept might
have been discovered.  You might consider  discovering multiplication
as a  numeric analog to  the operation cross-product, or  as repeated
addition.   Addition could have been found as the numeric analogue of
union,  or  as repeated  successoring.    Successor  is  the  numeric
analogue of CONS or insertion.

.SELECT 1

.S(This General Sequence of Reductions)

<chain SLIDE>  We've performed  a sequence  of reductions,  <Overlay:
arrows down> from primes to divisors-of to multiplication to addition
to counting to set-insertion.   Each step was justified by some  rule
of thumb: Here  it is "consider extremals", here it  is "consider the
inverse  of an interesting relation", here and  here it is "repeat an
interesting operation".

We could continue this analysis, but at some point  the concepts we'd
have to deal with get so elementary that we feel it makes no sense to
talk about "discovering" them.  At some point, they are the primitive
concepts we initially provided the college student researcher with.


I believe that  when we hear of a  new discovery, we mentally  try to
analyze it in this  fashion ourselves.  We try to perceive a chain of
concepts, stretching from  that ⊗4latest⊗*  discovery backwards,  not
all the way to our conceptual primitives, but at least all the way to
concepts we  already are familiar with.  If we succeed right away, we
probably think that the discovery wasn't so hard, that  we could have
done it ourselves.


<Remove  down-arrow overlay> Very  often, a  new discovery  is really
just a couple  steps in this  kind of process,  one or two  heuristic
applications away from what is already known.  That is why it's often
easy for us to believe that there wasn't really so much effort needed
to make the  discovery.  If  we already know  this concept, and  this
heuristic, it almost seems obvious  once we've heard that someone has
taken this step.

Why  then don't we make  new discoveries every few  minutes?  Why are
even some 1-step advances worth publishing?  The answer  can be found
by seeing  how (according to this model)  very simple new discoveries
are made.

We're talking now about reversing the flow of time here.  Instead  of
analyzing a discovery by  breaking it into simpler and  simpler ones,
we're talking about the synthesis of new concepts, the activity which
the researcher was carrying out all along.  He was continually trying
to grow new  nodes on this tree, using whatever  heuristic rules he's
learned over the years.

The researcher has a bunch of concepts that he knows about, say a few
thousand.  I  won't even consider  his ⊗4legal⊗* choices, since  they
are  quite  astronomical in  number.   Say  he  knows  a few  hundred
heuristics he can apply.   In any situation,  then, he may  literally
have a  million  alternatives to  consider, <OVERLAY  of red  lines>.
These are not his ⊗4legal⊗* choices, but his ⊗4plausible⊗* ones; each
and every one is justified by some heuristic, like this link is.

But nature is  not kind to us,  and only a  very small percentage  of
those new  concepts will turn  out to be  interesting.  What  is even
worse, he  can't whiz through this space, stopping at the interesting
concepts, because it may require hours -- sometimes even  years -- of
research to decide whether the concept he's studying is worthwhile or
a dead-end.

Even here, where we have just a few concepts and heuristics, the tree
gets bushy  when we  try to  go in  this direction.   We could  apply
"looking  at inverse" here, (cross-product)  to decide to investigate
the  kind  of  operation  we  normally  call  projection.    Or  here
(divisors-of) even if  we decide to look for extremals,  which way do
we  go?  Why  not examine numbers  which have very  many, rather than
very  few, divisors?    Why  not  keep creating  new  operations,  by
repeating the  last one created,  thereby discovering exponentiation,
hyper-exponentiation, and so on?

You  may  give  me  a   special-case  rebuttal  for  each  of   those
suggestions, but I think that rebuttal will be based on hindsight.

The analysis procedure was a search for a path from a given node back
to primitives,  but the synthesis procedure is  blind, doesn't have a
well-specified goal, so it is combinatorially more explosive.

So it's harder to make  a valuable new discovery than to  rationalize
how someone might plausibly have  discovered something he's just told
you  about.  This explains why  discoveries sometimes seem so obvious
⊗4after the fact⊗*, even to their inventor who may have labored years
on the problem.

The same  let-down occurs  when a magician  shows you how  he manages
some magic trick, or equivalently  when an AI researcher like  myself
shows you  how his  program was  really able  to manage some  amazing
behavior.

.S(Introduce Heuristic Guidance)

Now it's clear  that this synthesis  task, exploring outward  to find
new valuable concepts, is a very explosive search process.

The standard Artificial Intelligence method for muffling an explosion
of red arrows  is to introduce purple  and green heuristic  guidance.
<2nd  OVERLAY: green/purple>:  purple strategies  highlight the  most
promising  nodes to  work on  at a  given moment, and  ⊗4wiggly green
ideas suggest  furiously⊗* one  or two operators  to try  first on  a
given node.

Recall  that each  red operator is  a rule  of thumb.   So  the green
strategies tell us, in a given situation what the best heuristics are
to apply.

,From now on, when I use the term  "heuristic rule", I'll really mean
a   two-sided   creature   like   this   one,   <SLIDE:   heurs.;   a
situation/action rule. It says that in some empirical situation, here
are some plausible things to spend your time doing.

Here, the rule's  IF-part, or left-hand-side, was  asking whether two
concepts have some nontrivial yet unexpected overlap. If so, the rule
concludes that it's worth the trouble to isolate and explicitly study
that overlap.

.S(We Write a Theory Formation Pgm)

The basic idea to remember from all  this is that simple mathematical
discovery  might be  successfully represented  as a  heuristic search
procedure: <SLIDE:  3  tasks> a  continual  expansion  of a  tree  of
concepts and relationships, guided  by some judgmental criteria, some
informal rules of thumb.

To  test this  hypothesis, I wrote  a computer  program, called A.M.,
which proposes  new  concepts in  simple  mathematical domains.  This
program  goes  through the  same  kinds  of  concept exploration  and
development that the hypothetical undergraduate researcher did.

AM is to  be heuristic  search program, and  in the  tradition of  AI
programs of this form, there are three  separate things I should tell
you  about: the  same things  you always  have to  do to  implement a
heuristic search:

.BEGIN INDENT 0,3,0 PREFACE 0

(1) specify the initial core of knowledge that it will start with,

(2) specify the legal ways of adding to that knowledge

(3) collect the strategies and tactics that experts in that field use
when trying to form theories.

.END



.S(Mathematical Theory Formation)

If anyone asks at  the end, I'll discuss why mathematics  is an ideal
field in  which to try out  these ideas.  For  now, assume that we've
settled on very elementary math.  Before carrying out these  3 steps,
I should at  least mention what I mean by  a ⊗4mathematical concept⊗*
and a ⊗4theory⊗*.

A  mathematical theory <SLIDE: thy> is  built on a ↓_⊗2FOUNDATION⊗*_↓
of primitive,  undefined ↓_objects_↓,  and  ↓_operators_↓, plus  some
postulates  and  axioms  which  interrelate  them.    Each  of  these
qualifies to be called a mathematical ↓_concept_↓.

Onto this foundation is built a tower of theorems -- statements which
are  implied by  the  foundation  --  plus some  definitions  of  new
concepts in terms of preexisting ones.

This  is the final,  breath-taking architecture  that we see  in math
textbooks.   The foundation  is laid  carefully, and  the edifice  is
built smoothly, brick by brick, with no mistakes, no backtracking.

OPTIONAL:

.ONCE INDENT 5,5,5

<math  textbook-order slide>  All  undefined objects  and axioms  are
stated,  and  then an  infinite  define-state-prove loop  is entered.
Necessary lemmata are stated and proved before the theorems which use
them.    Occasionally, this  process  is  broken  by some  post-facto
motivation or real-world application.   This is entire process  never
involves backtracking.

What  they ⊗4don't⊗*  show  you  in  textbooks and  journals  is  the
unaesthetic scaffolding  that was required to  erect that tower: that
scaffolding is empirical induction and good old-fasioned search, with
guessing, back-tracking and everything.

At the very elementary levels,  at least, mathematics is an empirical
science.  The AM  program  actually proceeds  in almost  the opposite
order from textbooks,  as it develops  new relationships and  defines
new concepts.

Most of the mathematician's empirical forays are fruitless, but there
is no way to know this before  carrying them out.  His only  guidance
are his  intuitions about what  to investigate  next, his purple  and
green heuristic strategies.

This  same character of searching  a large space also  carrie over to
AM.

Let me  just  pause  a  moment  and mention  that  AM  is  writtn  in
INTERLISP, it is about 150k big, it  lives at SUMEX PDP-10 KI-10.  It
is a running system, and has managed to discover all the concepts and
theorems which I have mentioned so far, and the ones I will mention.

<3tasks, revisited  SLIDE> Recall  that there  is a  standard set  of
tasks to perform, to create such a system.



.SELECT 1

.S(Writing AM)

The first  step was to  list all the  concepts the system  would know
about initially. That  is, a bunch of primitive notions which are the
starting state  of the system's  knowledge.   

<SLIDE concept> Here are the hundred or so concepts that AM started with.
For reasons I'll go into if anyone asks, I chose a small set of concepts
which any mathematically-sophisticated four-year-old would have,
concepts which Piaget might have called
⊗4pre-numerical⊗*.  

Included   here  are  static   structural  concepts  like   sets  and
ordered-pairs,    and    active    ones    like    union,    reverse,
compose-2-relations, repeat, and so on.

Note that there  is no
notion of numbers or proof.

At  a  slightly  deeper  level,  we  have  to decide  precisely  what
information the system will know about each of these concepts.

Certainly we should  provide a  ↓_definition_↓ for each concept, and a name.

For each operation, we should also mention its ↓_domain and range_↓, and
give an algorithm for computing it.

What else should be
supplied initially about each concept?

To help AM decide what to do, it would be nice to supply some kind of
rough  judgment of  the worth  of each  concept: its  usefulness, its
simplicity, symmetry, etc.  Instead of grappling with the metaphysics
or this problem,  I simply attached,  ad-hoc, a number to each concept,
which was supposed to represent its overall worth.

If we know of any heuristic rules relavant to dealing with that type of concept,
they might be tacked onto it.


So there are many facets  to each concept, <SLIDE facets>. Some of the
ones we didn't mention yet were:
Examples,   Generalizations,   Specializations,
and Analogies.

This  set of  facets is  fixed once  and for  all.  Each  concept can
possess any or all of  these facets. But that's all we can  ever know
about a concept.  A  concept, as represented inside AM, ⊗4is⊗* simply
a bunch of facets, drawn  from this fixed set.   

In the LISP program  implementing these ideas, each concept is stored
as an atom, and its facets  are implemented as the properties on  its
property list.

Let's  consider  one  particular  concept,  and  see  what  kinds  of
information  could go into  each facet.   We'll consider  the concept
COMPOSE, which refers to the composition of two relations.  We define
AoB(x) as A(B(x)). That  is, apply B and then apply  A to the result.
<COMPOSE  slide>  <go  over  each facet>.    Here  is  an abbreviated
description of that concept.

Several equivalent definitions are provided. Some are computationally
efficient, and others are slow  but especially easy to for the system
to analyze.  Each definition  is an executable LISP predicate,  which
takes 3  arguments and tries  to see whether  the composition  of the
first two is equal to the third.

The Domain/range facet notes that Compose accepts a pair of activities,
and creates a new one.

The specializations  facet points to another concept which is similar
to  Compose.   This  specialization  only composes  an activity  with
itself.

An ⊗4example⊗* of Composition is provided here.

One conjecture that  AM found and stored here  is that composition is
associative.  

Down here we have some heuristics  for dealing with compositions.  An
overall rating of this concept,  how to  spot an
interesting compostion  (e.g.,  if the  d-r  of the  composition  are
equal), hints  for filling in  certain facets of  any new composition
which  gets  created,  and   something  worth  checking  about   each
composition.

The format of each facet is fixed, but the formats vary from facet to
facet, from productions <heurs> to little programs <algs>, to bits of
declarative data  <d-r>.    

Let  me put  back the list  of concepts again,  and also  our list of
facets.<SLIDE>

I provided most of the contents of these facets by hand, for each of the
initial 100 concepts. AM  later filled out  the other facets.
Whenever  AM   created  a   new  concept,   it  had   of  course   no
"hand-supplied" facets at all for it.

.S(What AM Does)

What does  the  system actually  do, now?   From  a heuristic  search
viewpoint,  the  primary activity  of  the system  is  to  create new
concepts. These are the highly-visible "legal moves": taking existing
concepts and combining them, conjoning them, disjoining them, running
them, and modifying their definitions.  This plethora of alternatives
is too big to  consider, so AM only  creates a new concept  when some
heuristic rule  indicates that it might  be worth the  gamble. So the
heuristics act like little plausible move generators.

When AM does create a new concept, most of its facets will be  blank,
so that in  practice most of the  system's time is spent  in fleshing
out some recently-discovered concept, filling in some of its facets.

In other words, there are two ways that AM expands its knowledge. One
way is to pick an  already-known concept <SLIDE: Compose> and try  to
find  out more  about it,  by  filling in  some facet  of it  <SLIDE:
Compose.exs  overlay>. The second way AM  expands its knowledge is by
defining new concepts out of old ones <SLIDE: ∪o' overlay>.

In the LISP  implementation, that translates  to (1) filling out  the
properties  of  existing   data  stuctures  (here  some  examples  of
Compositions have been  found), and sometimes  (2) creating  entirely
new data structures (Here the system takes  one particular example of
Compose and promotes it from being just one entry on one facet of one
concept, into being a full-fledged concept module with facets  of its
own).

What happens when a  new concept is created?  Well,  at the moment of
its  birth, AM knows  its definition, and something  about its worth.
Also how it connects to some existing concepts.  So a few seconds are
spent filling in all the  facets whose contents are obvious and which
would be much harder to fill in later on.  Once it's created, most of
its facets will still be empty, so there will  be many new things for
the system to do.

Since the  "fleshing out" activity is quite  expensive, it would be a
mistake to just try to fill  out all the facets of all the  concepts.
For  example, AM  starts  off with  115  concepts.   Of  the 2  dozen
possible facets they might possess, only a few are filled-in for each
concept; there are 2000 blank  facets initially.  If would  take days
of  cpu time for  them all  to get filled  in.   And each time  a new
concept was proposed, an hour might be shot working just to flesh  it
out completely.  AM must manage its time more carefully.

There must be some  "fine structure" to the process of  growing a new
node onto the  tree of concepts.  This is a non-standard problem, not
faced during  your typical  heuristic search.   We  can't permit  the
system to completely fill out each new node it wants to look at.

The solution I  use is to have  AM maintain a list  of very plausible
tasks,  and repeatedly choose the most  promising of them to execute.
Each task is at the level of fillng out a particular blank facet of a
particular concept. So  a few facets of some  concept will get filled
out, and the situation at that moment might then cause AM to shift to
filling in some facets of an entirely different  concept.  When a new
concept gets created,  most of its facets may remain blank for a long
period of time.

The LISP implementation of  this idea is  to have a global  job-list,
<SLIDE joblist> an  agenda of suggested activities to  do, an ordered
list  of candidates for the system's  attention.  Each entry consists
of a specific task for the system to work on, a priority value, and a
list of reasons why this is a reasonable task to do.  Except for this
list of symbolic reasons,  this is the same  as the agenda  mechanism
proposed by Winograd, Bobrow, and Kaplan for their new language, KRL.


At some  moment, there might  be hundreds of  entries on  the agenda;
this one indicates that the system spend some time filling in example
of the connept Primes.

This agenda governs the  top-level control structure of the  program.
The system picks the highest priority job off this list, and executes
it.

The  priority rating is  a function  of the values  of these reasons.
They in  turn  are  proposed locally,  by  the heuristic  rule  which
suggested this job in the  first place. We'll see this in more detail
in a minute.


In the process of performing a job, 3 kinds of things can happen:

.BEGIN PREFACE 0 INDENT 3

(1) blank facets may get  filled in.  In the  case of this task,  the
Examples facet of the Primes concept would hopefully get filled in.

(2)  new concepts  may  get created.    Up here,  when  this task  is
executed, the resultant gnneralizations will be full-fledged concepts
themselves.

(3) new tasks may get added to the agenda. After filling  in examples
of primes, they might be so rare  that the system would add a task of
the form "Generalize the cocnept of Prime numbers".

.END


<SLIDE: 3  actions> Each of these activities is guided by a heuristic
rule. I'd like to take  a minute and illustrate how a  heuristic rule
can manage each of these kinds  of actions.  The basic idea is that a
heuristic rule triggers,  because its  left side is  relevatn to  the
current state  of the world.  The right side  of the rule  contains a
list  of actions  to perform,  and those actions  are of  these three
types.

.S(Action 1: Fill in entries of facets)

The first  kind of  action that a  heuristic rule  can direct is  the
filling  in  of specific  pieces  of information  about  a particular
concept.

For example, here's a rule <SLIDE: fill exs> which gives one strategy
for  filling in  examples of  any  concept C.    It says  to find  an
operation  whose range is C, then find  examples of the domain of the
operation, and simply run that operation and gather up all the values
it returns.  Those values should then be examples of C.

Here  is how  it might be  used to  fill in  some examples  for Sets,
assuming a few were already known. It would say to find an  operation
whose range is Sets.  One such known operation is Union. Then it says
to run that operation, and the results will be examples of Sets. Sure
enough, if we do that, we do produce lots of sets.  Some of them will
be sets which weren't known before.

A similar kind of heuristic rule is used to find conjectures: Here is
↓_a   rule  <SLIDE:  find  conjec>   which  has  AM_↓   ask  if  a  known
generalization of concept X is  really equivalent to it. If  so, this
gets  recorded, as  an entry  on the  Conjecs facet  of  the concepts
involved.  This is the only way that AM noticed any conjectures: when
specifically directed to do so by some heuristic rule or other.

.S(Action 2: Creating new concepts)

<SLIDE: remove overlay from 3 actions> We've  seen how heuristics can
suggest  new jobs  to add  to the  agenda.   How  can they  cause new
concepts to be created?

Consider this heuristic: <SLIDE: Create-con>. It says to look for the
inverse image of extreme elements under an interesting operation.

We  already  saw  how  this  heuristic  can  lead  to  proposing  the
definition of prime numbers, when the operation F is divisors-of.

The heuristic  contains enough information to define the new concept,
and to fill in any  other information which is trivial now  but might
be  very hard  to  reconstruct later  on  (like why  the concept  was
created).

.S(Action 3: Proposing a new job)

<SLIDE: remove overlay from 3 actions> The  third kind of action that
a  heuristic rule can  initiate is to  suggest a new  task which then
gets added to the agenda.  Let's see how this happens.

A heuristic rule, like this one <SLIDE: New-job> will trigger because
its  left-hand side  is satisfied.   Somewhere  among the  actions to
take,  listed here on its  right side, is  one which explicitly calls
for a new task  to be added to the agenda.   The rule says that  if a
predicate appears to be  rarely satisfied, Then the following task is
plausible: fill in some generalized forms of that predicate.   Notice
that the rule also specifies a English phrase which is the ⊗4reason⊗*
why the  task is now plausible,  and also supplies a  way to rate the
overall worth of that reason.

The precise numbers turn out not to be important.

.SELECT 1

Let's see how  this particular rule  might be used  to suggest a  new
task for the agenda.  Say  AM recently worked on finding things which
were  Equal to each other, but that  very few random pairs of objects
turned out  to be equal.   Then  this rule  might trigger, and  would
propose this  task: <SLIDE: task1>.   If this task  wasn't already on
the agenda, it would get inserted there.

But suppose ⊗4this⊗*  <SLIDE: partial  agenda> was the  state of  the
agenda at the time.  We see riht here that the task is already on the
agenda, but  for a different reason.  In such  a case, the new reason
would be  recorded, and  the priority  value of  the  task would  get
boosted.  <Full agenda> Here is the way that task would look. The new
reason has raised it to the top of the agenda.

This  turned out to  be a  surprisingly important mechanism.   When a
newly-proposed job turns  out to be  ⊗4already⊗* on the agenda,  then
its reasons  are examined.  If  the job is being  proposed for pretty
much the  same  reasons  as  before,  then  its  priority  rating  is
incremented only  slightly.  But  if it is  being proposed for  a new
reason, as in this case, then its priority is increased tremendously.
This is the practical importance of maintaining reasons for each job.

AM can compare two reasons to  see if they are equal, and that's  all
it has  to do  manage this scheme.   Other than  that, there  is very
little  that AM can do to  a reason, except type  it out to the user.
Each reason is actually just a token, but as  you can imagine they're
quite useful for  keeping the human user aware of  what the system is
doing and why.

In general, the priorty value of a task on the agenda depends on  the
ratings  of all  the  supporting  reasons.   The  reasons  are  rated
locally, by  the same heuristic rules which  proposed them.  A single
⊗4global⊗* formula then looks at the job and the reasons, and assigns
it a  single priority rating from  0 to 1000.   <SLIDE: Formula> That
formula  was just guessed  at as a  first pass, and  it's worked well
enough  not  to  tamper  with  it.    It   involves  squareroots  and
dot-products of vectors and isn't really worth delving into.

Each concept, each facet, each  reason supporting a job on the agenda
has a number  attached to  it.  This  formula uses  those numbers  to
attach an  interestingness factor,  a priority  rating, a number,  to
each job on the agenda.

<SLIDE: Put  back full agenda> AM looks  over the job-list, picks the
job with the highest priority, and tries to execute it.  The  flow of
control is so distributed that it's almost unpredictable.

That priority rating  of the chosen task then dictates  how to manage
the  system's resources:  it indicates the  number of  cpu seconds to
spend before giving up and trying the next task on the agenda, and it
specifies the number of list  cells that the task is permitted to use
up before it must quit, and so  on.  (This idea was inspired by  Carl
Hewitt's notion of activation energy.)

We are assuming that there is a  self-consistent set of computational
rules  for manipulating the estimated  worth of all  the concepts and
jobs.   This is  at  best a  first-order theory  of  interestingness.
Comparing the worths  of "equality" and "primes"  is really comparing
apples to oranges.

I  think that this scheme works  adequately here only because neither
the heuristics nor the jobs are strongly coupled; they are all pretty
much independent of each other.

For example,  it doesn't really matter  which order the  top few jobs
get executed in; no one cares whether we look for primes right before
generalizing equality, or right afterwards.   What ⊗4does⊗* matter is
that  they are  near the  top, not  near the  bottom, of  the agenda.
That's why a crude kind of evaluation function is all you need.

.SELECT 1

.S(Locating Relevant Heuristics)

<SLIDE: where we are> We've seen 3 kinds  of actions a heuristic rule
can  take.  Furthermore, we've  seen how the  agenda mechanism always
keeps AM  working on  a  plausible task.   But  we're not  really  in
business until I  explain one more thing. Once a  task is chosen, how
are  the  right  heuristic  rules located,  the  ones  which  will be
relevant to the task just chosen?

There are about 250 heuristic rules altogether in AM; that's too many
to simply evaluate the left sides of ⊗4all⊗* of them.

After explaining this, the overall behavior of AM will be described.

Like a human mathematician, AM should only access a strategy in those
situations where  it could  plausibly be  used.   By looking  at  the
heuristics,  it seemed  that each  one  has a  very  clear domain  of
applicability, of relevance.


For  example,  <SLIDE:  compose  heur>  one  heuristic  says  that  a
composition FoG is  interesting if it preserves  some of the  unusual
properties of F and G. This is  a useful thing to know when trying to
judge the  interestingness of a composition, but it won't help you if
the current  task is  to check  examples of Lists.   We  may as  well
associate this heuristic with the concept Compose.  We'll record this
heuristic rule  inside the  Interest facet  of the  Compose  concpet.
That's what this orange line means.

Other heuristics are  a little bit  more general, they relate  to any
operation <SLIDE:  Operation heurs overlay>, so  they would be tacked
onto the concept Operation.  This one is an entry on the Fillin facet
there.  This  one is on the Interest facet  of the concept Operation.
It says that any operation is interesting if its domain and range are
identical.

As AM was written, I gathered heuristics by introspection, and tacked
them onto the concept I  thought they were most relevant to.  <SLIDE:
Genl branch> The more general the  heuristic rule, the weaker it  is,
typically.  So each individual heuristic rule is tacked onto the most
general kind of entity  to which it applies.  Whenever I thought of a
heuristic, I pushed it as high as possible in this tree  of concepts.
When dealing with  any concept X, all the heuristics  attached to any
generalization of X, that is, any higher node, are also considered to
be relevant to X.

Say AM wants to see how interesting this particular concept is.  Then
AM  could  use  evaluation  heuristics  gathered from  any  of  these
concepts.  Any reason why an operation might be interesting will also
be valid here.

The  system would  ripple  upward,  in  the direction  of  increasing
generality,  collecting  heuristics  as  it  went.    They  would  be
evaluated, and their  weighted average result would  be taken as  the
estimated inteestingness of this concept.

In  fact,   <OVERLAY>  ⊗4these⊗*  features  are   satisfied  by  this
operation,  so  they combine  to indicate  it  is a  mildly promising
concept.  Notice that the further upward we go, the less specific the
heuristics get, and the less useful they are.

If AM  wanted to  fill in generalizations  of this concept,  it would
ripple  upwards   again,   gathering  heuristics   for   filling   in
generalizations.  Any  technique for generalizing an  operation would
certainly work here too.

The  same kind of  rippling would  collect relevant heuristics  if AM
wanted to fill in examples of a certain concept, or check the entries
of the domain/range facet of this concept.


Since  the  heuristics  attached  to these  (specific)  concepts  are
typically  more powerful than  the heuristics way up  here, AM always
executes  the  heuristics in  the  order  they  are  collected.    No
sophisticated reasoning  is done to order  them in some  way prior to
obeying them.

This works [in  practice] because  the heuristics  rarely couple;  at
most, they superimpose thier effects; in either case the ⊗4order⊗* in
which  they are executed is  not too important.   They don't interact
much.  Simple schemes like this only work when the components  of the
system  are very  independent of  each  other, when  the problem  was
easily   decomposable   or   factorable  to   begin   with.     Tight
interdependence would require a more sophisticated approach.

That's  one   reason  AM  couldn't   progress  indefinitely  far   in
mathematics, at least AM as it now stands.

.S(Cardinality Example)

Let's  finally  take a  look  at  the  program in  action.    <SLIDE:
Cardinality slide>


.SELECT 1

This  is a  condensed excerpt  from a  session with  AM, as  it first
discovers concept  of numbers.   AM  already knew  about sets,  bags,
lists,  etc., and  equality of  2  sets.   A  bag is  a multiset,  an
unordered structure with repeated elements allowed.

I'll read  through it once to show you that AM really performs magic,
then we'll peek behind the curtains to see that AM  is really no more
than  the  straight-forward data-structure  expander  that I've  been
describing.

(1) After testing  random structures  to see  if they  are equal,  AM
concludes that  it would  be worthwhile  to generalize the  predicate
EQUAL. The user asks "why" and AM mentions that few randomly selected
objects turned out to be equal to each other.

(2) Two generalizations  of EQUAL are constructed.   The first  turns
out to be  the relationship "has the same length  as", and the second
is "has the same first element as".

(3)  After  testing  random  structures  to  see  if  they  have  the
SAME-LENGTH, AM  decides to  try to  find  a canonical  form for  all
structures.   This would  mean that two  structures were of  the same
length if and only if their canonical forms were identically equal.

(4) AM comes up  with the awkward idea  of making this conversion  by
(i) converting  each  structure to  a BAG,  and  (ii) replacing  each
element by  the constant T.  Thus the  canonical form of our alphabet
would be a  bag 26 T's.   AM decides  to restrict various  operations
like Union  and Intersection  to such  canonical structures,  and see
what happens.

The  user tells  AM that such  canonical structures  should be called
NUMBERS.

Well, now that you've seen what AM says, let's see how it does it.

.S(Finding Examples of Equality)

Recall that  Initially, I supplied  by hand a  definition or  two for
each of the  115 concepts AM started with. Each such definition was a
LISP predicate. I also gave each concept a rough numeric Worth value.
Each concept  which was  an operation also  received an  algorithm or
two,  and a statement  of domain/range.  Finally,  I introspected and
collected all the  general heuristics I  could, and tacked them  onto
the  appropriate concepts. Most  of these  heuristics applied  to the
general concepts, like  Operation. Very few  applied to the  specific
concepts, like Equality or Reverse.

Equality was one  of the initial  base concepts. It originally  had a
very low  Worth rating.  AM began to notice,  e.g., sets all of whose
members were equal,  and gradually  boosted Equality's Worth  rating.
When Equality became  valuable enough, a general  heuristic rule said
to  consider filling in some  examples of it.  This  was added to the
agenda, and eventually AM got around to that task.

How in fact  did AM find examples  of objects which were  EQUAL, once
that job was  chosen?  The job said to fill  in the EXAMPLES facet of
the EQUAL concept.   To do that, AM  gathered heuristics by  rippling
outward <SLIDE: Genls(Equality)> from Equal.

There were no heuristics located here, but some were found on each of
the higher, more general concepts.

One  of these (Any-concept)  rules of  thumb said to  instantiate the
definition of the predicate EQUAL.   Three definitions of Equal  were
supplied by me initially.  Here is one recursive definition of EQUAL:
<defn  of EQUAL  slide>.   AM possesses  several syntactic  rules for
instantiating LISP  predicates,  which is  all  that this  definition
really is.

An easy way to instantiate  this, to satisfy this, would be simply to
satisfy this  base step.   To  do  that, the  two arguments  must  be
identical atoms.  So  some examples like NIL=NIL, T=T,  are produced.
Also,  AM inverts this  recursive step,  by plugging in  here objects
already known to be equal.   But AM just  found that NIL=NIL.  So  if
the CARs  of 2 structures  are both T, and  their CDRs are  both NIL,
then  the structures  will be Equal.  So an  example like  (List T) =
(List T) is generated.

An entirely  different approach  is suggested by  a heuristic  tacked
onto the Activity concept.  It says that, when filling in examples of
any activity, including a predicate, we can randomly instantiate  the
arguments, run the concept, and observe the  results.  So AM accesses
the  domain/range facet  of EQUAL,  sees there the  entry: a  pair of
objects gets mapped into True/false. Then AM repeatedly  picks random
pairs  of objects  and sees  if they  satisfy EQUAL,  by running  the
little program  stored inthe Algorithm facet of Equal.  This is where
the empirical  data comes  from that  tells  AM how  rarely EQUAL  is
satisfied.

We already  saw the heuristic that said  to consider generalizing the
definition of a predicate, if very few  examples can be found.  So  a
new  job  is added  to  the  agenda,  which says  to  generalize  the
definition  of EQUAL.   Since it  has a good  reason, it gets  a high
priority value, and it soon rises to the top of the job-list.

Note that  the  techniques  for  filling in  all  those  examples  of
Equality were drawn  from very general concepts: ways to  fill in any
concept,  ways  to  fill  in  any  active  concept.    There were  no
special-purpose tricks,  no  specialized  heuristics for  filling  in
examples of Equality.

Similarly,  AM used very  general techniques  for noticing  that this
predicate should be generalized, and for generalizing its definition.
AM in fact  was given only a  few syntactic rules for  generalizing a
LISP   predicate,  and   therefore   some  semantically-uninteresting
generalizations of Equal  were created.   After a  while, AM  noticed
more and more relationships involving  one of those new concepts, and
its Worth became higher and higher.

.S(Generalizing Equality)

Assume that  AM has plucked  that job off  the agenda, the  one which
says  to generalize  the  definition of  EQUAL.   AM  magically turns
Equality into Equal-length.

It  does  this  using  the  syntactic  rules  for   generalizing  the
definition of  any concept.  One  rule says to remove  a conjunct, so
that the new predicate would recur in only one direction.

In particular, by doing away with the test in the CAR direction <defn
OVERLAY slide>, we get a  predicate which counts down two  lists, and
returns T  if and only  if they both  become empty at  the same time.
That is, iff they have the same length.

A few other concepts were generated which were boring generalizations
of Equality. There are  two reasons why syntactic rules  suffice: (1)
AM never generalizes X unless it has a very good reason, so this is a
very rare  occurrence; (2)  very quickly,  AM willnotice  interesting
relationships  involving some  of  these, and  lose  interest in  the
other, boring ones.

<SLIDE:  User sees it again> The second  magic trick was deriving the
canonical  form  of   a  structure.     AM  did  this  by   selective
experimentation, which I'll describe later if anyone asks.


This  whole development  was done  to satisfy  just 3  jobs  from the
agenda list:

.BEGIN FILL INDENT 4,0,0 PREFACE 0

(Fillin Exs. of Equality)

(Generalize Defn. of Equality)

(Canonicalize Same-length wrt Equality)

.END


This example is really compressed by a factor of 10; it distorts AM's
abilites  and limitations.    Let's move  at  a faster  pace  through
another, longer, but more faithful excerpt of AM in action.

.S(Example 2)

.BEGIN TURN ON "∞→"

After  going through  this  final  example,  I'll just  describe  the
overall results of this project.

We'll start in the middle of a sessions with AM, after 64 other tasks
have been  chosen and executed.  AM has  already discovered  numbers,
arithmetic, squaring, square-rooting, odd  numbers, and several other
elementary  objects  and operators.    The concept  of  Divisors-of a
number has just been defined.

<SLIDE: Ex2>

Task 65: AM was interested  in the concept of Divisors-of,  and found
several examples of it.

66:  The  inverse  images of  various  extremal  kinds  of sets  were
considered.  AM noticed that numbers-with-3-divisors all seemed to be
perfect squares.  How  did it notice that relationship?   It took one
example of  these numbers, 49.  AM then asked  what 49 was an example
of,  besides  Numbers-with-3-divisors.    The  answer  was:   number,
odd-number,  perfect-square.   So one  hypothesis was  that all  such
numbers  might be perfect  squares. This  was tested on  the other 10
examples of of Numbers-with-3-diviosrs, and sure enough they all were
squares.     So   AM  conjectured   that  Nos-with-3-divisors   is  a
specialization of the concept Perfect-squares.

⊗3↓_∞ → _↓⊗*

67:  The   square-roots   of  those   numbers   turned  out   to   be
numbers-with-2-divisors.   Which the user  then renamed  as "Primes".
The value of both concepts was raised.

⊗3↓_∞ → _↓⊗*

79:  Later,  AM   examined  the  inverse  of  Times.    For  example,
Times↑-↑1(12) is the set  of all possible ways  of factoring 12.   It
contains  the bag  (2 2  3),  since 2x2x3  is  12.   AM noticed  that
Times↑-↑1 of each number seemed to contain a bag of primes.  So a new
concept was  created, a  new relation which  maps a  number into  all
possible factorizations of that number into primes.

⊗3↓_∞ → _↓⊗*

80: Lo and behold, that relation turned out to be a function. This is
the fundamental theorem of arithmetic.

⊗3↓_∞ → _↓⊗*

84: Lest you think that AM was always this brilliant, let's  continue
and look at one of the next things it  did. It examined the inverseof
Addition.  For  example, Add↑-↑1(6) contains the bag (1 1 2 2), since
1+1+2+2 equals 6.   One of many conjectures  AM notices then is  that
Add↑-↑1 of a  number often contains a pair of  primes. A new relation
is  created, which takes a number and  returns a list of all possible
ways of representing it as the sum of two primes.

⊗3↓_∞ → _↓⊗*

106: Eventually, AM studies  this, and notices that all  even numbers
seem to  have at least one  such representation.   This is Goldbach's
conjecture.   AM  doesn't rate  this  conjecture very  highly,  since
primes are not intimately connected with addition.

107: Here, AM isolates  those number which can be  represented as the
sum of 2 primes in only one way. While this is a novel set of numbers
to investigate, AM wasn't able to find anything interesting out about
them.

⊗3↓_∞ → _↓⊗*

.END

.S(Results)

Now that we've seen a  few examples of AM in action,  let's step back
and look at the whole run, of which these were just excerpts.

This  run  occurred  a few  months  ago.   AM  started  with  the 115
prenumerical concepts  I  showed you  earlier,  with about  8  facets
filled  in   for  each:  typically,  the   name,  definition,  worth,
domain/range, algorithm, fillin heuristics, checking heurisics, and a
pointer to a known generalization.

AM worked completely  by itself for  1 cpu hour,  when it ran out  of
space and  died.  AM  had filled in  a total of  200 previously-blank
facets on these primitive concepts, and it had created about 135  new
concepts. The  new concepts  ended up  with an  average of 10  facets
filled  in.  Half of  those new ones were  not really worth defining,
and were technically  called losers, but  the rest were  interesting.
<SLIDE: Winners>. Note AM went through the whole chain of discoveries
I  talked  about during  this talk,  all  the way  from set-theoretic
notions to numbers to  unique factorization. Several new  concepts of
questionable value were derived.

AM did  this all on its  own; if I  sit at the keyboard  and guide it
along, the same  discoveries can  be made in  about a  third the  cpu
time.  <Results again>

200 jobs from the  agenda were executed during the cpu  hour AM used.
Each  job was allocated from  1 to 100 cpu  seconds, depending on its
priority value.  Most  jobs were granted  about 30 seconds, but  used
under 20.

The  1-hour run  ended  because  of space  problems,  and because  AM
couldn't  fill  in  very  good heuristics  for  the  new  concepts it
created.  As  it got  further and further  from its initial  starting
basis, the methods  it was able to bring to  bear on its new concepts
got relatively weaker and weaker.   For example, it was dealing  with
primes using set-theoretic heuristics.

To correct this, AM  would need a body of  good meta-heuristics; that
is,  rules for finding and  fiilling in new rules.  At the moment, it
doesn't have any.   Since most heuristics  are based on hindsight,  I
would  guess that  most of  the meta-heuristics  couldn't be  applied
until  AM has  already made  most of  the possible mistakes  it could
make.  I'm  not sure, and  I leave this  as an open research  problem
<SLIDE: open problem>.

All was not perfect, of  course. Several tasks were chosen at bizarre
moments, and many poor tasks  were chosen. <SLIDE: Losers again>  One
danger that AM  is prone to is  that of regular or  looping behavior.
Any kind of repetition should -- but doesn't -- warn AM to be careful
and look for a  unifying generalization of  what it's doing, to  rise
above the  infinite loop it's fallen  into.  That's how  this concept
arose  (ComposeoCompose...).  Similarly,  AM repeated multiplication,
then repeated exponentiation, then repeated hyper-exponentiation,...

Although any specific example I  give you might have fixed  by adding
one  simple  rule,  the  general  problem  of  detecting  regularity,
infinite loops, is quite difficult and  I leave that as another  open
problem.

IF ASKED: A HANDLE ON THE SOLUTION IS:

.ONCE INDENT 5,5,5

By   investigating  each   concept  it   defined,   and  basing   its
interestingness judgments on those results, AM was able to avoid most
such traps.

.S(Experiments With AM)

One  important  reason  for actually  constructing  AM  was  that  of
⊗4experimentation⊗*: one  can vary the concepts  AM starts with, vary
the  heuristics available,  etc.,  and  study  the  effects  on  AM's
behavior.  <SLIDE: expts> Several such experiments were performed.


One involved adding a couple dozen  new concepts from an entirely new
domain:  plane  geometry.    AM  busied  itself  exploring elementary
geometric concepts,  and was  almost as  productive there  as in  its
original  domain.   New concepts  were defined,  and  new conjectures
formulated.  For example, AM had the notion of identical equality  of
lines  and  of  triangles, and  generalized  these  into  similarity,
parallel, congruence etc.  It noted many new concepts, like triangles
sharing a  common side  and  a common  angle. AM  also found  a  cute
geometric  interpretation  of  Goldbach's conjecture,  which  it  had
proposed  earlier:<SLIDE: prime angles>  Any angle, from  0 to 180↑o,
can be approximated (to within 1↑o) as the sum of two angles of prime
degree.   If our  culture and  technology were different,  this might
have been an important, familiar result.  Incidentally, much  sharper
results and much more general ones were also obtained.

<SLIDE:  expts again>  Several  of  the  experiments on  AM  involved
de-tuning  the system,  e.g.  mutilating  the formula  which assigned
each task  a priority  rating.   Or: radically  altering the  initial
Worth  numbers  of  the  concepts.    Generally,  AM's  behavior  was
degraded, but overall it was much more robust than anticipated.

An  interesting result was that AM  uses precise numeric values very
rarely, only in very blind situations, where no decent reasons exist.
When AM  enters a chain of  discoveries, it's sustained  by masses of
reasons, not by nuances in their numeric ratings.

Another unexpected result  of this  experiment was that  the top  few
jobs on the agenda  almost always suggest new high-priority  jobs, so
that a job sitting  in tenth place on the agenda at some moment has a
very low chance of ⊗4ever⊗*  being chosen, unless of course some  new
reasons for it emerge.  So  even if you had a kilo-processor machine,
executing  all the  tasks on  the agenda simultaneously,  it wouldn't
really speed up  the rate  at which AM  makes great discoveries  more
than by a single factor of  ten.  Not only wouldn't it help, it would
reduce the user's opinion  of AM's rationality: a  lot of that  stems
from AM's choice of which task is best at each moment.

Let me move on to another kind of experiment  that was performed.  It
was expected that  a few key concepts (e.g.  Equality) and heuristics
(e.g., define  "f" as  repeated "g"-ing,  where g  is an  interesting
operation)  would have  a tremendous  impact on  AM's behavior:  that
removing  them  would destroy  most of  AM's  discoveries.   This was
partly  true,  but  a  surprising  effect  was   noticed  when  these
experiments were actually done last week. Many of the very common and
valuable concepts manage to be rediscovered in many separate ways, so
that the absence of  any single concept or heuristic  isn't critical.
Multiplication  arises in four  separate ways,  so one would  have to
remove many concepts to prevent its being noticed.

Several more experiments have been planned for the near  future. This
summer, I intend to add about twenty new concepts, dealing with proof
and  proof  techniques, to  see how  the  reasons for  conjecturing a
theorem might aid in its subsequent proof.

To me  personally, this whole  businss of  experimenting with AM  has
been the most exciting aspect  of the project.  It's like using AM as
a tool, to explore the process of discovery -- at least discovery  in
very elementary mathematics.

.S(Maximally-Divisible Numbers)

While working on its own, AM has not discovered any mathematics which
is  new and publishable,  although it has  done so when  working as a
co-researcher with a human. AM noticed simple new concepts which most
humans had  overlooked, but AM  by itself  was not able  to precisely
formulate interesting statements about those concepts.

Besides the cute Goldbach interpretation just described, AM motivated
the development of  a new result in  number theory.  I'll  close this
slide show with one slide about it. <SLIDE: AM conjec>

Just  as  AM  decided  to  look  at numbers  with  extremely  ⊗4few⊗*
divisors, it  thought  to  look at  those  with  abnormally  ⊗4many⊗*
divisors.   For any  number d, the  smallest number  with at least  d
divisors  is at least this  big. If its's factored  into primes, then
their exponents decrease inversely as the logs of the primes.  We can
substitue in an estimate for k in  terms of d, and the higher one can
prove β (>2) is, the better this result.

For example,  here's the smallest number with 6 million divisors, and
in fact the  exponents of its  prime factors satisfy this  constraint
fairly closely,  and the AM  conjecture predicted that  this number n
would be this big,  so it was off  by less than a  factor of two,  so
it's  a  very  sharp  bound.   Erdos  recently  showed  me  the  best
previously-known  formula bounding n(d),  and it  could only gurantee
that n would be bigger than this, so it was off by more than  a whole
order of magnitude.

AM  made this  definition,  found examples  of such  highly-composite
numbers,  and then suggested something like this relationship. Later,
by hand, I  was able to derive  these formulae, and with  Erdos' help
this one.

A couple  months ago, I found  out that Ramanujan  defined these same
numbers and found a  similar regularity to  this one.   As far as  is
known, this particular formula is new, but  of no tremendous interest
to anyone.

It was AM which expected that this kind of maximally-divisible number
would be very interesting.  I thought I knew better, but in this case
AM really was right.

In dozens of other instances, AM seemed to me to be going nowhere and
in  reality it was actually  going nowhere. It  outsmarted me in this
cute way only twice.

.S(Conclusions)

What is the final conclusion of all this?

AM is forced to judge ⊗4a priori⊗* the value  of each new concept; it
⊗4must⊗*  quickly lose  interest in  concepts  which aren't  going to
develop into anything.   Often, such judgments  can only be based  on
hindsight.   For similar reasons,  AM has difficulty  formulating new
heuristics  which are relevant  to the  new concepts it  creates.  It
needed good meta-heuristics, and I don't even know  whether any exist
for this domain.

While AM's "approach" to empirical research may be used in other very
hard scientific domains, the main limitation (reliance on  hindsight)
will probably recur.  This prevents AM  -- and future similar systems
-- from progressing indefinitely far on their own.

Before this  ultimate limitation was reached, AM  had in fact carried
out some  math  research.   The research  was  of a  very  elementary
character, and  the same program  may not be  able to carry  out much
more sophisticated theory formation, but nevertheless we can conclude
that certain  kinds  of hack  discoveries,  certain routine  research
tasks,  can  be represented  adequately  as  a rule-guided  heuristic
search.

.SKIP TO COLUMN 1

*********************************************************************

Ideally, one could run AM overnight, and wake up the next morning  to
find five or  ten more or less  interesting new concepts to  look at.
It  would  still   be  the  researcher  who  made  the  highest-level
selections of  what  to do  next,  much  like a  professor  directing
graduate students.  This is the ultimate use that I hope such systems
will be put to -- probably in the next centry: to work synergetically
alongside a  researcher,  to  free him  from  doing all  the  routine
detective work.

.S(Supplement)

↓_ Finding canonizing function for numbers_↓

Experiment number 1  was to  convert the arguments  from one type  of
structure  to  another,  and   see  if  that  changed  the  value  of
SAME-LENGTH when it was applied to them.  It turned out not to, which
meant that one single canonical type of structure could be chosen.

But there  are 4  kinds it  could be:  a set,  a list, a  bag, or  an
ordered set.   The particular kind of  structure depends on 2 things:
whether altering the order of the elements in the arguments makes any
difference to SAME-LENGTH, and whether adding multiple occurrences of
the same element makes any difference to SAME-LENGTH.

But  changing  the  order  makes  no  difference,  so  the  canonical
structure type  must be  unordered, either BAG  or SET.   But  adding
multiple elements ⊗4does⊗* make a difference, so the type must be BAG
or LIST.  The only possible canonical type is therefore BAG.

Next, AM notices  a message in  the Ties facet  of SAME-LENGTH,  that
explicitly  says  that  it's  the  same   as  Equality,  except  that
SAME-LENGTH does not recurse in the CAR direction. The CAR of an atom
is its  value cell, so  that message  is enough  evidence to  warrant
testing whether or  not the value cells of any  of these elements are
ever  accessed.  Empirically,  they never  are. So AM  assumes it can
replace them all by a single, fixed value: say "T".

Putting this all together,  we see the canonizing  transformation as:
convert the structure to a BAG, and substitute T for each element.

**********************************************************************

Another thought I hope you'll leave with today is that (for this task
at least) if the heuristic operators are sophisticated enough, then a
simple numeric calculus is all the meta-level heuristics you need for
adequate   performance.      And   you   don't   need   any  explicit
meta-meta-heuristics at all.  It would be interesting to  see if this
is really true  for other tasks as well.  I  suspect tht this is true
whenever the heuristics are fairly independent of each other.

One problem we all  face is how  to cope with  changes in a  system's
knowledge  base. The  standard  solution is  to  track down  all  the
effects of each change, and update the whole data base.  The solution
AM uses,  and  perhaps people  too, is  to  just record  the  changed
information, and ⊗4not⊗* to update  everything.  When a contradiction
of  some kind occurs, then the  conflicting knowledge is checked, the
knowledge ⊗4it⊗*  was  based  on is  checked,  and  so on,  until  we
encounter some  changed opinion which explains the  disagreement.  So
the system is allowed to hold contradictory viewpoints, so long as it
could resolve any  dispute if it  had to.   This is one  solution you
might  keep in mind  for your  knowledge-based systems. I  think that
⊗4this⊗* works because there are so few contradictions; that  in turn
is due to the independence of the resoning involved.